Analysis of Household Energy Usage for 3 Swiss Households
Data
This project examines the Electricity Consumption and Occupancy (ECO) dataset. The consists of energy measurements from 3 swiss households over a period of 8 months. For each household, the dataset consists of readings from a smart meter as well as the specific energy consumption of 7 different appliances. The data was aggregated so that each observation is the total consumption for one day from each household. All measurements were converted to kilowatt-hours (kWh).
The purpose of this analysis is to construct useful interactive visualizations to effectively answer the data science questions below. Specifically, the plotly and altair libraries are used to create interactive visualizations. A few “sketches” of the data are created with seaborn before constructing the final interactive charts.
Data Science Questions
Which appliances consume the most energy on a daily basis? How does the relative consumption differ between households?
Does total energy consumption change throughout the year? Does this differ between the three phases of energy consumption?
Data Preparation
The first step in our analysis is to prepare the data for analysis. In order to analyze the data, we need to get it into a useable format. The raw data is structured in separate files. For each household, there is a folder for the meter data and the plug data. The meter data contains the recordings from the meter for every day in a separate csv file. We combined the 600 separate csv files into a single dataframe using the following procedure:
Read a single day’s data into a dataframe.
For each csv file, calculate the daily energy usage in kilowatt hours (kWh)
To convert to daily kWh: Daily energy consumption (kWh) = (Total watts used in a day / 1,000) / 3,600
Combine all csv files into a single dataframe
From the smart meter data, we are left with the daily energy usage per household for every date. Other features such as current, voltage, and variance of the measurements were initially considered but discarded since they are not the focus of this analysis.
The second source of data were the “plugs” for each house. For every house, the energy usage of a specific appliance is recorded in a separate folder, with one csv per day. The appliances recorded were:
Fridges
Dryers
Coffee Machines
Kettles
Washing machines
PCs
Freezers
As with the smart meter data, we converted the plug for each house into a single daily observation. For every date, plug, and household, we retrieved the energy usage in kWh using the calculation above and joined it to the smart meter dataset to match the respective date and household. The code used to manipulate the data is below.
Code
# For every subfolder in the ./data folder, if the suffix is _SM, read in every file as a dataframe# For each file, add the columns names below:# columns = ['powerallphases', 'powerl1', 'powerl2', 'powerl3', 'currentneutral', 'currentl1', 'currentl2', # 'currentl3', 'voltagel1', 'voltagel2', 'voltagel3', 'phaseanglevoltagel2l1', 'phaseanglevoltagel3l1', # 'phaseanglecurrentvoltagel1', 'phaseanglecurrentvoltagel2', 'phaseanglecurrentvoltagel3']# Then, take the mean and SD of each column so there is only one row# Append a data column that has the filename as a value# Append a household column that has the household as the value, which occurs before the underscore (e.g. H6, H5, or H4)# Join all rows together so that there is one dataframe with one day per each household# * powerallphases: Sum of real power over all phases# * powerl1: Real power phase 1# * powerl2: Real power phase 2# * powerl3: Real power phase 3# * currentneutral: Neutral current# * currentl1: Current phase 1# * currentl2: Current phase 2# * currentl3: Current phase 3# * voltagel1: Voltage phase 1# * voltagel2: Voltage phase 2# * voltagel3: Voltage phase 3# * phaseanglevoltagel2l1: Phase shift between voltage on phase 2 and 1# * phaseanglevoltagel3l1: Phase shift between voltage on phase 3 and 1# * phaseanglecurrentvoltagel1: Phase shift between current/voltage on phase 1# * phaseanglecurrentvoltagel2: Phase shift between current/voltage on phase 2 # * phaseanglecurrentvoltagel3: Phase shift between current/voltage on phase 3import pandas as pdimport osdata_dir ='./data'# Initialize an empty list to store the dataframesdfs = []# Loop through each subfolder in the data directoryfor folder in os.listdir(data_dir):if folder.endswith('_SM'):# Get the household ID from the folder name household = folder.split('_')[0]# Loop through each file in the subfolderfor filename in os.listdir(os.path.join(data_dir, folder)):if filename.endswith(".csv"):# Read in the file as a dataframe df = pd.read_csv(os.path.join(data_dir, folder, filename), header=None, names=['powerallphases', 'powerl1', 'powerl2', 'powerl3', 'currentneutral','currentl1', 'currentl2', 'currentl3', 'voltagel1', 'voltagel2', 'voltagel3','phaseanglevoltagel2l1', 'phaseanglevoltagel3l1', 'phaseanglecurrentvoltagel1','phaseanglecurrentvoltagel2', 'phaseanglecurrentvoltagel3'])# Calculate the mean and standard deviation for each column df_mean = df.sum().to_frame().transpose() df_std = df.std().to_frame().transpose()# Add a column for the filename and household df_mean['date'] = filename.strip('.csv') df_std['date'] = filename.strip('.csv') df_mean['household'] = household df_std['household'] = household# Add the dataframes to the list this_df = pd.merge(df_mean, df_std, on=['date','household'], suffixes = ('_daily', '_std')) dfs.append(this_df)# Concatenate all of the dataframes togetherresult = pd.concat(dfs, axis=0)# Make date and household the first two columnscols = ['date', 'household']cols.extend([col for col in result.columns if col notin cols])result = result.reindex(columns=cols)# Covert the power to kilowatts - powerallphases_daily,powerl1_daily,powerl2_daily,powerl3_dailyresult['powerallphases_daily'] = result['powerallphases_daily']/1000/3600result['powerl1_daily'] = result['powerl1_daily']/1000/3600result['powerl2_daily'] = result['powerl2_daily']/1000/3600result['powerl3_daily'] = result['powerl3_daily']/1000/3600# Drop all columns except date, household, powerallphases_daily, powerl1_daily, powerl2_daily, powerl3_dailyresult = result[['date', 'household', 'powerallphases_daily', 'powerl1_daily', 'powerl2_daily', 'powerl3_daily']]# Encode the date as a datetime objectresult['date'] = pd.to_datetime(result['date'])# Arrange by date and householdresult = result.sort_values(by=['date', 'household'])# Save the final dataframe to a CSV file#result.to_csv('output_SM.csv', index=False)# Sanity check - value counts for each householdresult['household'].value_counts()sm_data = result.copy()
The number of observations for each household in the dataset matches the number of days that they were recorded.
Code
# Plug data# Each household (H4, H5, H6) has a folder with the suffix _plugs# For each plug folder, there is a folder named 01 ... through 07. # Each of these folders has a separate csv for each day# We need to read in each of these files and calculate the sum of the power in kWh by summing and dividing by 3600000# Each plug should have a column with the date (from the filename) and household (from the folder)# Define a dictionary to map folder names to appliance typesappliance_map = {'01': 'fridge','02': 'dryer','03': 'coffee','04': 'kettle','05': 'washer','06': 'pc','07': 'freezer'}# Initialize an empty list to store the dataframesdfs = []# Loop through each household folderfor household_folder in os.listdir(data_dir):if household_folder.endswith('_plugs'): household = household_folder.split('_')[0]# Loop through each plug folderfor plug_folder in os.listdir(os.path.join(data_dir, household_folder)):# Get the appliance type from the dictionary appliance = appliance_map.get(plug_folder, 'unknown')if plug_folder !=".DS_Store": # Ignore the .DS_Store file# Loop through each CSV filefor filename in os.listdir(os.path.join(data_dir, household_folder, plug_folder)):if filename.endswith('.csv'):# Read in the CSV file as a dataframe df = pd.read_csv(os.path.join(data_dir, household_folder, plug_folder, filename), names=['power'])# Calculate the sum of power in kWh df = df.sum().to_frame().transpose() df['kWh'] = df['power']/3600000# Add columns for the date, household, plug, and appliance type df['date'] = pd.to_datetime(filename.split('.')[0]) df['household'] = household df['plug'] = plug_folder df['appliance'] = appliance# Append the dataframe to the list dfs.append(df)# Concatenate all of the dataframes togetherresult = pd.concat(dfs)# group by date and household, pivot by applianceresult = result.pivot(index=['date', 'household'], columns='appliance', values='kWh').reset_index()# Save the final dataframe to a CSV file#result.to_csv('output_plug.csv', index=False)plugs = result.copy()
Code
# Keep only yyyy-mm-dd and remove hh:mm:ssplugs['date'] = pd.to_datetime(plugs['date'].dt.date)
Code
# Join the sm data and plugs on the date and household, retaining all rows from bothdf = pd.merge(sm_data, plugs, on=['date', 'household'], how='outer')df = df.drop(columns=['unknown'])
Code
# Write the final dataframe to a CSV file# Drop the unknown appliance columndf.to_csv("power_data_clean.csv", index=False)
Exploratory Data Analysis
Finally, we have a tidy dataset on the date/house level. The shape of the dataset is 632x14, with 32 days from the plugs that did not occur in the smart meter dataset. The remaining columns are:
date: The day of the measurement
household: The household that the measurement was taken from (H4, H5, or H6)
powerallphases_daily: (kWh) The total energy usage for the day
powerl1_daily: (kWh) The energy usage for phase 1
powerl2_daily: (kWh) The energy usage for phase 2
powerl3_daily: (kWh) The energy usage for phase 3
coffee: (kWh) The daily energy usage for the coffee machine
fridge: (kWh) The daily energy usage for the fridge
dryer: (kWh) The daily energy usage for the dryer
kettle: (kWh) The daily energy usage for the kettle
washer: (kWh) The dailyenergy usage for the washing machine
pc: (kWh) The daily energy usage for the PC
freezer: (kWh) The daily energy usage for the freezer
To begin the exploratory data analysis, we can examine summary statistics, look at some of the distributions of variables as well as how the variables are correlated in the dataset. Some static visualizations will guide our final interactive visualizations.
Code
df.describe()
powerallphases_daily
powerl1_daily
powerl2_daily
powerl3_daily
coffee
dryer
freezer
fridge
kettle
pc
washer
count
600.000000
600.000000
600.000000
600.000000
329.000000
597.000000
551.000000
578.000000
566.000000
560.000000
591.000000
mean
15.414557
7.070770
2.719145
5.624328
0.315968
0.170527
0.579788
0.258389
0.195285
0.290460
1.884256
std
8.999032
5.265384
1.944670
3.936660
0.285509
0.103838
0.373696
0.291260
0.194002
0.327534
1.784528
min
1.864760
0.432333
0.002444
0.774391
-0.008834
-0.019447
-0.018895
-0.020608
-0.022533
-0.019892
0.044193
25%
8.630140
1.924706
1.188225
2.286015
0.093377
0.124683
0.148998
0.000002
0.087050
0.031047
0.626971
50%
15.749312
6.967787
2.413147
4.950535
0.288394
0.142218
0.704999
0.111247
0.131475
0.222543
1.067496
75%
20.149719
10.595041
3.596908
8.370188
0.471979
0.199236
0.846771
0.586220
0.286475
0.325684
4.225914
max
61.602174
34.199160
17.380410
25.077918
1.334717
0.619953
1.323844
0.979461
1.078786
1.856928
6.239601
A few observations from the summary statistics:
There are the fewest observations from the coffee plug
The greatest energy usage occurs in phase 1 on average. Phase 1 also has the greatest standard deviation.
The plug with the greatest daily usage, on average, is the washer.
Next, we will create some rudimentary visualizations of the distributions below:
Code
# Use seaborn pairplot to examine distributions and relationshipsimport matplotlib.pyplot as pltimport seaborn as snssns.pairplot(df, hue='household')
<seaborn.axisgrid.PairGrid at 0x17068ea70>
Another rough visualization that could help direct our final product is a heatmap.
Code
# Create a heatmap to demonstrate the relationships between the variablesplt.figure(figsize=(10,10))sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
/var/folders/dm/hqy2vyf53pj_9vry3bc0lwrc0000gn/T/ipykernel_59707/654618798.py:3: FutureWarning:
The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
<AxesSubplot: >
Development of interactive visualizations
Visualization 1: Which appliances consume the most energy on a daily basis? How does the relative consumption differ between households?
A visualization that demostrates the distributions of each appliance’s daily energy usage per each household could help us answer this question.
Code
# Save the tidy dataframe to another variabletidy_df = pd.read_csv('power_data_clean.csv')# Pivot the data longer so the appliance types are in a single column and the power is in a single columndf = tidy_df.melt(id_vars=['date', 'household'], var_name='appliance', value_name='kWh')# Only keep the seven appliance typesdf = df[df['appliance'].isin(['fridge', 'dryer', 'coffee', 'kettle', 'washer', 'pc', 'freezer'])]
Code
import pandas as pdimport plotly.graph_objects as gofrom plotly.subplots import make_subplots# Define a dictionary to map appliance names to colorscolor_map = {'fridge': '#3A2E39','washer': '#1E555C','dryer': "#899795",'coffee': "#F4D8CD",'kettle': '#F1C5A8','pc': '#EDB183','freezer': '#F15152'}# Create the viz lists# First list, every 7th value is true, the rest are false, 21 values totalviz_list1 = [Trueif i %7==0elseFalsefor i inrange(21)]viz_list2 = [Trueif i %7==1elseFalsefor i inrange(21)]viz_list3 = [Trueif i %7==2elseFalsefor i inrange(21)] viz_list4 = [Trueif i %7==3elseFalsefor i inrange(21)]viz_list5 = [Trueif i %7==4elseFalsefor i inrange(21)]viz_list6 = [Trueif i %7==5elseFalsefor i inrange(21)]viz_list7 = [Trueif i %7==6elseFalsefor i inrange(21)]# Create a subplot for each householdfig = make_subplots(rows=1, cols=3, shared_yaxes='all', shared_xaxes='all', subplot_titles=['House 4', 'House 5', 'House 6'], x_title='Daily Energy Usage (kWh)', y_title='Number of Days')# Function whether to show legenddef show_legend(i):if i%3==0:returnTrueelse:returnFalse# Loop through each householdfor i, household inenumerate(['H4', 'H5', 'H6']):# Filter the dataframe by household household_df = df[df['household'] == household]# Create a histogram for each appliancefor appliance in household_df['appliance'].unique():# Filter the dataframe by appliance appliance_df = household_df[household_df['appliance'] == appliance]# Add a histogram trace to the subplot fig.add_trace( go.Histogram( x=appliance_df['kWh'], name=appliance, marker_color=color_map.get(appliance, 'gray'), nbinsx=30, legendgroup=appliance, showlegend = show_legend(i) ), row=1, col=i+1 )# Update the subplot layoutfig.update_layout( title='Daily Energy Usage for All Appliances by Household', barmode='overlay', xaxis=dict(range=[-0.2, 6.2]), yaxis=dict(range=[0, 210]))# Add a dropdown menu to filter by appliancefig.update_layout( updatemenus=[ go.layout.Updatemenu(type='dropdown', showactive=True, buttons=[ {'label': 'All', 'method': 'update', 'args': [{'visible': True}, {'title.text': 'Daily Energy Usage for All Appliances by Household'}]}, {'label': 'Coffee', 'method': 'update', 'args': [{'visible': viz_list1}, {'title.text': 'Daily Energy Usage for Coffee Machine by Household'}]}, {'label': 'Dryer', 'method': 'update', 'args': [{'visible': viz_list2}, {'title.text': 'Daily Energy Usage for Dryer by Household'}]}, {'label': 'Freezer', 'method': 'update', 'args': [{'visible': viz_list3}, {'title.text': 'Daily Energy Usage for Freezer by Household'}]}, {'label': 'Fridge', 'method': 'update', 'args': [{'visible': viz_list4}, {'title.text': 'Daily Energy Usage for Fridge by Household'}]}, {'label': 'Kettle', 'method': 'update', 'args': [{'visible': viz_list5}, {'title.text': 'Daily Energy Usage for Kettle by Household'}]}, {'label': 'PC', 'method': 'update', 'args': [{'visible': viz_list6}, {'title.text': 'Daily Energy Usage for PC by Household'}]}, {'label': 'Washer', 'method': 'update', 'args': [{'visible': viz_list7}, {'title.text': 'Daily Energy Usage for Washer by Household'}]} ] ) ])fig.update_traces(xbins=dict( # bins used for histogram start=0.0, end=6.5, size=0.1 ))fig.update_traces(marker_line_width=1,marker_line_color="black", opacity=0.8)fig.update_layout(hovermode="x unified")# Show the plotfig.show()
I will highlight some of the specific choices made in the results section below. The above chart shows the number of days each compliance used that approximate amount of energy.
Visualization 2: Does total energy consumption change throughout the year? Does this differ between the three phases of energy consumption?
For the next visualization, I would like to examine how energy usage changes over the year by phase. We will start by aggregating across the three households by day and renaming some of the variables for clarity in the labels.
Code
# Pivot powerl1_daily powerl2_daily powerl3_daily to long format and create a "phase" columndf_phase = pd.melt(tidy_df, id_vars=['date', 'household'], var_name='phase', value_name='kWh')# Only keep the 'powerl1_daily', 'powerl2_daily', and 'powerl3_daily' and powerallphases_daily rowsdf_phase = df_phase[df_phase['phase'].isin(['powerl1_daily', 'powerl2_daily', 'powerl3_daily'])]# Start by averaging each datedf_phase = df_phase.groupby(['date', 'phase']).mean().reset_index()# Remove the one NA datedf_phase = df_phase.dropna()# Show headdf_phase.head()# Recode the phases to be Phase 1, Phase2, and Phase3df_phase['phase'] = df_phase['phase'].replace({'powerl1_daily': 'Phase 1', 'powerl3_daily': 'Phase 3', 'powerl2_daily': 'Phase 2'})
/var/folders/dm/hqy2vyf53pj_9vry3bc0lwrc0000gn/T/ipykernel_59707/635431084.py:8: FutureWarning:
The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
Next, we will construct the chart using altair’s linked views.
Code
import altair as alt# Interval for date selectioninterval = alt.selection_interval(encodings=['x'])# Y axis scaleyscale = alt.Scale(domain=(0, 22))# Define custom color schemecolor_scale = alt.Scale(domain=['Phase 1', 'Phase 2', 'Phase 3'], range=['#F1C5A8', '#F15152', '#1E555C'])# Base chart - usage of phases over timebase = alt.Chart(df_phase).mark_line(size=2).encode( x=alt.X('date:T', title ="Date"), y=alt.Y('kWh:Q', scale = yscale), color = alt.Color('phase:N', title ="Phase", scale=color_scale))# Top chart - zoomed in on selected intervalchart = base.encode( x=alt.X('date:T', scale=alt.Scale(domain=interval.ref()), title ="Date"), y=alt.Y('kWh:Q', title ="Daily Energy Usage (kWh)")).properties( width=800, height=300, title ="Average Daily Energy Usage of Three Households July 2012 - Jan 2013")# Bottom chart - brush to select intervalview = base.add_selection( interval).properties( width=800, height=50, title ="Click and Drag Below to Select a Time Interval:")# Add a histogram that reflects the mean of each phase during that intervalhist = ( base.mark_bar() .encode( y=alt.Y("kWh:Q", aggregate='mean', stack=None, title="Mean Energy (kWh)", scale=yscale), x=alt.X('phase:N', stack=None, title="Phase", axis = alt.Axis(labelAngle=-45)), color=alt.Color("phase:N", scale=color_scale), ) .transform_filter(interval) .properties( width=100, height=400, title ="Mean Energy Usage (kWh) in Selection" ))alt.hconcat(hist, alt.vconcat(chart, view), center =True)
/Users/bren/opt/anaconda3/envs/anly503/lib/python3.10/site-packages/altair/utils/core.py:317: FutureWarning:
iteritems is deprecated and will be removed in a future version. Use .items instead.
Results
Now I will explain the specific design choices that went into designing the two final visualizations.
Visualization 1: Daily Energy Usage By Appliance for Each Household
Code
import pandas as pdimport plotly.graph_objects as gofrom plotly.subplots import make_subplots# Define a dictionary to map appliance names to colorscolor_map = {'fridge': '#3A2E39','washer': '#1E555C','dryer': "#899795",'coffee': "#F4D8CD",'kettle': '#F1C5A8','pc': '#EDB183','freezer': '#F15152'}# Create the viz lists# First list, every 7th value is true, the rest are false, 21 values totalviz_list1 = [Trueif i %7==0elseFalsefor i inrange(21)]viz_list2 = [Trueif i %7==1elseFalsefor i inrange(21)]viz_list3 = [Trueif i %7==2elseFalsefor i inrange(21)] viz_list4 = [Trueif i %7==3elseFalsefor i inrange(21)]viz_list5 = [Trueif i %7==4elseFalsefor i inrange(21)]viz_list6 = [Trueif i %7==5elseFalsefor i inrange(21)]viz_list7 = [Trueif i %7==6elseFalsefor i inrange(21)]# Create a subplot for each householdfig = make_subplots(rows=1, cols=3, shared_yaxes='all', shared_xaxes='all', subplot_titles=['House 4', 'House 5', 'House 6'], x_title='Daily Energy Usage (kWh)', y_title='Number of Days')# Function whether to show legenddef show_legend(i):if i%3==0:returnTrueelse:returnFalse# Loop through each householdfor i, household inenumerate(['H4', 'H5', 'H6']):# Filter the dataframe by household household_df = df[df['household'] == household]# Create a histogram for each appliancefor appliance in household_df['appliance'].unique():# Filter the dataframe by appliance appliance_df = household_df[household_df['appliance'] == appliance]# Add a histogram trace to the subplot fig.add_trace( go.Histogram( x=appliance_df['kWh'], name=appliance, marker_color=color_map.get(appliance, 'gray'), nbinsx=30, legendgroup=appliance, showlegend = show_legend(i) ), row=1, col=i+1 )# Update the subplot layoutfig.update_layout( title='Daily Energy Usage for All Appliances by Household', barmode='overlay', xaxis=dict(range=[-0.2, 6.2]), yaxis=dict(range=[0, 210]))# Add a dropdown menu to filter by appliancefig.update_layout( updatemenus=[ go.layout.Updatemenu(type='dropdown', showactive=True, buttons=[ {'label': 'All', 'method': 'update', 'args': [{'visible': True}, {'title.text': 'Daily Energy Usage for All Appliances by Household'}]}, {'label': 'Coffee', 'method': 'update', 'args': [{'visible': viz_list1}, {'title.text': 'Daily Energy Usage for Coffee Machine by Household'}]}, {'label': 'Dryer', 'method': 'update', 'args': [{'visible': viz_list2}, {'title.text': 'Daily Energy Usage for Dryer by Household'}]}, {'label': 'Freezer', 'method': 'update', 'args': [{'visible': viz_list3}, {'title.text': 'Daily Energy Usage for Freezer by Household'}]}, {'label': 'Fridge', 'method': 'update', 'args': [{'visible': viz_list4}, {'title.text': 'Daily Energy Usage for Fridge by Household'}]}, {'label': 'Kettle', 'method': 'update', 'args': [{'visible': viz_list5}, {'title.text': 'Daily Energy Usage for Kettle by Household'}]}, {'label': 'PC', 'method': 'update', 'args': [{'visible': viz_list6}, {'title.text': 'Daily Energy Usage for PC by Household'}]}, {'label': 'Washer', 'method': 'update', 'args': [{'visible': viz_list7}, {'title.text': 'Daily Energy Usage for Washer by Household'}]} ] ) ])fig.update_traces(xbins=dict( # bins used for histogram start=0.0, end=6.5, size=0.1 ))fig.update_traces(marker_line_width=1,marker_line_color="black", opacity=0.8)fig.update_layout(hovermode="x unified")# Show the plotfig.show()
The goal of this visualization was to determine how much power each appliance was using. Further, I wanted to separate the graphs by household in case there were significant differences in usage between the households. In order to gain insight into how much energy each appliance was using, I encoded each appliance to a different color, and each household to a different subplot. I found that this would be more effective than the opposite: having house encoded by color and each appliance having its own facet. Since there are only 3 households - by extension 3 facets - this makes for a more compact visualization.
The interactive features are used to enhance the communication of information. The dropdown menu allows the user to filter the data by appliance, allowing for a clearer view of each histogram. This is especially useful for comparing the appliance usage between households. While having all appliances plotted at once can help one determine the general differences in household usage, viewing separate appliances can facilitate comparisons on that level. For example, if we select “washer”, we can determine that one household utilizes their washer very heavily (house 4), wheres the other two tend to have more normally distributed usages. We can also see that house 4 most likely does not use a pc, since the usage is 0 kWh for every day.
Finally, there is a tooltip feature that is useful for comparing between appliances. If we hover over any portion of the graph, we are able to get the histogram counts for each of the appliances.
Overall, this visualization is effective for comparing energy usage between appliances and households.
Visualization 2: Energy usage by phase over time
Code
import altair as altalt.data_transformers.enable('default',max_rows=None)# Interval for date selectioninterval = alt.selection_interval(encodings=['x'])# Y axis scaleyscale = alt.Scale(domain=(0, 22))# Define custom color schemecolor_scale = alt.Scale(domain=['Phase 1', 'Phase 2', 'Phase 3'], range=['#F1C5A8', '#F15152', '#1E555C'])# Base chart - usage of phases over timebase = alt.Chart(df_phase).mark_line(size=2).encode( x=alt.X('date:T', title ="Date"), y=alt.Y('kWh:Q', scale = yscale), color = alt.Color('phase:N', title ="Phase", scale=color_scale))# Top chart - zoomed in on selected intervalchart = base.encode( x=alt.X('date:T', scale=alt.Scale(domain=interval.ref()), title ="Date"), y=alt.Y('kWh:Q', title ="Daily Energy Usage (kWh)")).properties( width=800, height=300, title ="Average Daily Energy Usage of Three Households by Phase July 2012 - Jan 2013")# Bottom chart - brush to select intervalview = base.add_selection( interval).properties( width=800, height=50, title ="Click and Drag Below to Select a Time Interval:")# Add a histogram that reflects the mean of each phase during that intervalhist = ( base.mark_bar() .encode( y=alt.Y("kWh:Q", aggregate='mean', stack=None, title="Mean Energy (kWh)", scale=yscale), x=alt.X('phase:N', stack=None, title="Phase", axis = alt.Axis(labelAngle=-45)), color=alt.Color("phase:N", scale=color_scale), ) .transform_filter(interval) .properties( width=100, height=400, title ="Mean Energy Usage (kWh) in Selection" ))#hist | (chart & view) alt.hconcat(hist, alt.vconcat(chart, view), center =True)
The second visualization depicts energy usage over time for each phase. The total power consumption is measured as the sum of the three phases. The three phases could be helpful to monitor for energy professionals. My understanding is that in a three-phase system, balancing the load across the three phases is important. Here, we can see that the phases are not equally loaded, and that phase 1 tends to be the highest. There are some periods where phase 3 is the highest, and briefly the lowest. More detailed interpretation would need to be conducted by someone with more domain knowledge. The other (more general) function of this graph is to see how energy consumption changes over time. We can see that the month of december has the highest energy usage, which could be due to electric heating.
There are two primary interactive features: The selection at the bottom, and the reactive plot on the left. The selection at the bottom allows the user to select a time period of interest, and the units on the larger chart above update automatically. This permits for more granular visualization - although in this case, we have aggregated to the daily level, so that is the smallest unit. The reactive plot calculates the mean of each phase during the time period. This is helpful for the user to understand the approximate levels of energy used by each phase during that period.
Conclusion
The first visualization communicated the energy usage of individual appliances in each household. Interactivity and faceting were used to facilitate comparisons between households and appliances. The second visualization reflected the energy usage for each phase over time, and combined a subsetting view with a reactive histogram. This second visualization was helpful for understanding the trends in energy usage over custom time periods.
References
Christian Beckel, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014). Memphis, TN, USA. ACM, November 2014.
Wilhelm Kleiminger, Christian Beckel, Silvia Santini Household Occupancy Monitoring Using Electricity Meters. Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015). Osaka, Japan, September 2015.